gh-140795: fetch thread state once on fast path for critical sections#141406
gh-140795: fetch thread state once on fast path for critical sections#141406kumaraditya303 merged 3 commits intopython:mainfrom
Conversation
|
How much of a difference does this make to |
It is ~5% faster on macOS with this change. |
pythonGH-141406 improved performance by only fetching thread state once and storing it in a variable on the stack. This instead puts the thread state in the PyCriticalState struct (also a temp variable on the stack), bringing the public and private implementations closer together.
|
I don't understand why this is only done for Putting the thread state in a temp stack variable for all users seems to work well: https://github.com/python/cpython/pull/146066/files |
|
I wouldn't necessarily expect it to help non-stdlib users. There are basically three cases:
In the main executable, everything gets inlined and the access to In non-stdlib extensions, the calls are not inlined. The access to In extensions like |
Currently on the fastpath for critical section where the objects are not locked, the thread state is fetched twice once while acquiring it and once while releasing it. This PR optimizes it to fetch it once and store it (in a temp variable on stack) so that on the fastpath thread state is fetched once.
This should help performance in shared modules such as
sslwhere thread state access is slower because of extra function call to_PyThreadState_GetCurrent.